Skip to main content

Sorting and Relevance

Sorting

Sorting by Field Values

  • _score is not calculated
    • it is expensive to calculate score
  • sort value returned

Single Value

GET /_search

{
...
"sort": {
"date": {
"order": "desc"
}
}
}

Multilevel sorting

{
...
"sort": [
{
"date": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
}
]
}

Multivalue sorting

{
...
"sort": [
{
"dates": {
"order": "desc",
"modes": "min" // min, max, avg, sum
}
},
]
}

String Sorting

  • Analyzed string cannot be sorted in lexocological order
  • Use multifield mapping => field is not stored twice but indexed in 2 different ways
"tweet": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}

Relevance

  • Relevance score depends on type of query caluse
  • Standard algorithm used to calculate relevance => Term Frequency / Inverse Document Frequency (TF / IDF)
TermDescription
Term FrequencyHow often term appears in field (more often more relevant)
Inverse document frequencyHow often term appears in index (more often less relevant)
Field-length normField length (longer => less relevant)

Understanding the score

GET /_search?explain

{
"query": {
"match": {
"tweet": "honeymoon"
}
}
}

Output

{
// Metadata
"_index": "us",
"_type: "tweet",
"_id": "12",
"_score": 0.076713204,
"_source": { ...trimed... },
// Shard / Node info, score is callculated per shard level (rather than index)
"_shard": 1,
"_node": "asdoaskfnadsf3412",
"_explanation": {
"description": "weight(tweeit:honeymoon in 0)
[PerFieldSimilarity], result of:",
"value": 0.076713204,
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"value": 1,
"details": [
{
"description": "termFreq=1.0",
"value": 1,
}
]
},
{
"description": "idf(docFreq=1, maxdocs=1)",
"value": 0.30685282
},
{
"description": "fieldNorm(doc=0")",
"value": 0.25
}
]
}
}
note

Can be formatted to JSON too

Fielddata

  • Values for a field that has been loaded into memory
    • Because uninverting from disk is slow
warning
  • ES loads the values from every document in your index regardless of document type
  • Can consume a lot of memory for high cardinality fields
  • Usage
    • Sorting on a field
    • Aggregations on a field
    • Certain filters
    • Scripts that refer to fields